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Abstract. - We introduce a statistical method to detect nonlinearity and nonstationarity in 
time series, that works even for short sequences and in presence of noise. The method has a 
discrimination power similar to that of the most advanced estimators on the market, yet it depends 
only on one parameter, is easier to implement and faster. Applications to real data sets reject 
the null hypothesis of an underlying stationary linear stochastic process with a higher confidence 
interval than the best known nonlinear discriminators up to date. 
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Natural phenomena are often studied through measures 
of physical observables that change over time. Hence, time 
series analysis is of extraordinary importance for the com- 
prehension and the characterization of a physical process. 
The analysis of a time series should generally be able to de- 
tect, within some confidence level, the stochastic or deter- 
ministic nature of the underlying process, and eventually 
to quantify the degrees of freedom, the presence of non- 
linearity or nonstationarity, and finally the predicability 
of the future states. Experimental time series are affected 
by measurement error, and they are often corrupted by 
unknown noise sources. In addition to this, while some 
processes, such as laser emissions [1] or network traffic [2], 
can produce a large amount of data in few hours, other 
natural processes, such as sunspots (3H3 or seismic events 
[B], may require long times of observation to obtain rel- 
evant informations. As a consequence, methods of time 
series analysis should be able to work on both short and 
long noisy series. 

The surrogate data method provides a rigorous statisti- 
cal approach to the nonlinear features detection of a time 
series [BIB]- The method consists in formulating a null 
hypothesis, e.g. "the time series is generated by a linear 
and stationary stochastic process" , an alternative hypoth- 
esis, e.g. "the time series is not generated by a stationary 
linear stochastic process" , and in preparing a set of iV 
constrained surrogate time series, generally with the same 
linear statistical features as the original one. One or more 
observables, so-called estimators, discriminators, or dis- 



criminating statistics, are obtained from both the original 
time series and for the surrogates representing the null 
hypothesis: by performing a nonparametric test, as the 
rank order, observables are statistically analysed and the 
rejection of the null hypothesis is claimed within a certain 
confidence level. As pointed out in Rcfs. 7,8 , a large 
number of different measures have been considered over 
the years to detect nonlinearity in time series: higher order 
statistics, time reversal asymmetry, correlation dimension 
[5], largest Lyapunov exponent, nonlinear prediction error 
(NPE) [5HTT] and Volterra-Wiener-Korenberg polynomi- 
als [12], just to mention those most commonly adopted. 
However, some of the measures either have low discrimi- 
nation power [3] , or are not able to distinguish chaos from 
coloured noise, as the correlation dimension [T5HT5] . or re- 
quire long time series, generally not available in the real 
world. In particular, the NPE, a method based on phase- 
space reconstruction and that requires three parameters, 
has the highest discrimination power on short and noisy 
time series, and gives either better or comparable perfor- 
mance than other methods [5]. 

In this Letter, we introduce a fast method to reject the 
null hypothesis of an underlying stationary linear stochas- 
tic process. The method does not require any embedding 
procedure for phase space reconstruction, and is based on 
the evaluation of the differences between the original time 
series and the stationary linear model that best approxi- 
mates it. The kurtosis of the distribution of differences is 
finally used as discrimination statistics to reject the null 
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hypothesis. The method works well with both determin- 
istic and stochastic series, either time continuous or dis- 
crete. It is simpler to compute and faster than other ex- 
cellent discriminators, and it appears to be robust even for 
very short time series, highly corrupted with measurement 
noise. Indeed, as an application to a still open question, 
we will show how the method improves up to 98% the 
confidence level for the rejection of null hypotheses in the 
cases of the monthly smoothed sunspots index. 

Given a ^-samples univariate time series {s n }, with 
n = 1,2, we produce a set {si''}, i = 1,2, ...,N, 

of N surrogates of {s n }. Each of the surrogate is a time 
series having the same mean, variance, density function, 
power spectrum and, thus, autocorrelation function as the 
original time series {s n }. Different procedures to pro- 
duce surrogates fulfilling several requirements have been 
introduced over the years [T51|T7] . In particular, here, we 
adopt an iterative amplitude adjusting Fourier transform 
(IAAFT) scheme |16lfT8] . Since the density function and 
the power spectrum of {s„} and {si''} are the same, non- 
linearity or nonstationarity, if present, should emerge from 
the analysis of the deviations from the best stationary lin- 
ear approximation of the time series. The simplest sta- 
tionary linear model is the autoregressive AR(p) of order 
p, defined as 
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where the integer p is the memory range of the model, and 
e n is a stochastic stationary process with (e) = 0. To en- 
sure the stationarity of the process, the coefficients in 
Eq. ([T]) have to be chosen such that the roots of the char- 
acteristic polynomials: 1 — Sf=i a i z% = ne outside the 
unit circle. AR(p) are well known models used in time se- 
ries fitting problems, and a wide literature covers the exact 
and efficient estimation of the coefficients ai (see Ref. [T9] 
for an extensive and up-to-date review). Here, we adopt 
a least squares approach, solved by a QR decomposition 
algorithm. As a result of fitting the original time series 
{s n } with an AR(p), we get the coefficients cii, the model 
{i„}, and the residuals scries {e„}, defined as — s^—Xk 
for k = l,2,...,?i. As for the order p of the autoregres- 
sive model that best approximates the original time series 
{s„}, we have chosen the one that minimizes an informa- 
tion criterion, according to Akaike [50] and Schwarz [2"Tj . 
The estimator of linearity and stationarity we propose is 
the kurtosis of the residuals: 



K = 



(g-(g)) 4 ) 
(e-(~e)) 2 ) 2 



(2) 



The same fitting procedure is repeated for each of the N 
surrogate series and the obtained estimators are 

respectively indicated as K,^ . Finally, the test against 



the null hypothesis is based on comparing K, to the dis- 
tribution of £W. We name the discriminator in Eq. [2] 
autoregressive- fit residuals kurtosis (ARK). 

Summing up, the method consists, in practice, in the 
following steps: 

1. Given a time series {s n }, obtain the best stationary 
linear model that minimizes an information criterion. 

2. Build the time scries e„ of residuals and evaluate the 
kurtosis of their density. 

3. Fix a significance level a, and create N = 2 /a — 1 
surrogates of {s n }. Repeat points 1 and 2 for each 
surrogate time series 

4. Finally, compare the values of the discriminator in 
Eq. (|2|) obtained for {s n } and {s$} with a two-sided 
rank order test of size a. 

We have tested ARK with a wide variety of short time 
series, either from models and from real data sets. 

Synthetic time series. We first considered stationary 
linear autoregressive moving-average models (ARMA), i.e. 
discrete null models. We performed an extensive and sys- 
tematic analysis of ARK statistics, by varying both au- 
toregressive and moving average orders. As expected, for 
a discriminating statistics evaluated on null models, we 
obtained a flat density of p-values. The same result was 
obtained when considering stationary linear Langevin pro- 
cesses, i.e. continuous null models. It follows that ARK is 
not biased against the null hypothesis. 

We then addressed autoregressive moving-average mod- 
els in nonstationary regime, and several well known non- 
linear models, both in chaotic and non chaotic regime. 
In particular, we tested the stability of the method in 
discriminating nonlinearity and nonstationarity under the 
contamination of stochastic noise, correlated or uncorre- 
cted, of increasing strength, by calculating the probabil- 
ity (3 to accept the null hypothesis when the alternative is 
true (also known as type II error). For each simulated time 
series with variance , we added artificial noise with vari- 
ance a\ , and we studied the so-called power 1 — (3 of ARK 
by varying the ratio a e /a s . In practice, the results ob- 
tained where the method has a power of less than 70% are 
considered questionable [pj. In Fig. [T]we report the power 
1 — P versus the relative noise level a e /a s , for different dy- 
namical systems. The results were obtained with 10 4 rank 
order tests of size a = 1% (N — 199), corresponding to a 
confidence level of 99%. We considered both deterministic 
and stochastic series, either time continuous or discrete. In 
the case of noise-free nonlinear or nonstationary time se- 
ries, we get a discrimination power close to 100%. Even in 
the presence of noise, the ARK exhibits an excellent dis- 
crimination power. For all the considered dynamical sys- 
tems, a sequence of I = 1024 samples is enough to achieve 
a discrimination power larger than 70%, even for relative 
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Figure 1: ARK power versus noise strength for: a) Mackey- Glass map in the chaotic regime with the addition of correlated 
noise, b) Ikeda map in the chaotic regime, with uncorrelated noise, c) nonstationary autoregressive moving average (ARMA) 
with correlated noise, d) Lorenz model in chaotic regime with uncorrelated noise. 



noise levels as large as 40% in the case of maps, and as 
large as 150% in the case of the nonstationary ARMA 
and of the Lorenz system. Notice that adding a relative 
noise level of 150% to a time series corresponds to super- 
imposing a noise amplitude that is 1.5 times the intrinsic 
fluctuations of the original series. Indeed, in the case of 
a chaotic dynamics, such as the Lorenz system reported 
in panel d), it is possible to reach a discrimination power 
close to 100% also for shorter length (I = 768). This result 
is particularly relevant if compared against the result from 
NPE: in presence, for instance, of a relative noise level of 
150%, the phase space of the Lorenz model is not well re- 
constructed through the delay embedding method, and, in 
fact, a power of no more than 30% is achieved. Hence, the 
results in Fig. Q] show that our procedure is suited for the 
analysis of real datasets, generally noisy (a e /a s < 1) time 
series of few thousands of samples. 

Real data sets. The first nonlinear experimental time 
series we have considered is a sample of I = 1000 values 
of the intensity of a Far Infrared Laser (FIR) in a chaotic 
state (see Ref. [1]). In Fig. [2£i we show the rank ordered 
ARK values for both the FIR time series and 1000 of its 
surrogates: as expected, the rank order test rejects the 
null hypothesis with 99.8% confidence level. 

The second nonlinear experimental time series we have 
considered is a sample of I = 4096 values of intracranial 
EEG recordings during epileptic seizures (see Ref. [Hj ) . In 
Fig. 03 we show the rank ordered ARK values for both the 
EEG time series and 1000 of its surrogates: as expected, 
the rank order test rejects the null hypothesis with 99.8% 
confidence level. 

As third application to real databases, the discrimina- 
tion power of ARK is tested against time series of sunspots 
index. In particular, we consider the monthly smoothed 
sunspots index from 1749 to 2008, and the 11-years aver- 
aged sunspots index reconstructed up to the past 11,400 



















] ill nil 

| f I Wll'illi. 1 


- 






10O 200 500 400 50 


600 TOO 800 900 1000 




Sample, n 






. . 1 - 



ARfit Residuals Kurtosis 



a) 




ARfit Residuals Kurtosis 



b) 



Figure 2: Rank order test (1000 surrogates) for the Far Infrared 
Laser intensity time series (a) and for an EEG recording during 
epileptic seizure (b). The arrow indicates the value of ARK for 
the original time series, dots indicate the rank ordered values 
of ARK for their surrogates. 



years. The sunspots index, introduced by Wolf in 1848, is 
strongly related to the solar cycle discovered by Schwabe 
in 1843. Such series shows strong irregularities, partially 
explained by magnetohydrodynamics of dynamo, and a 
quasi-periodic behavior. However, the real nature of solar 
cycle is still debated, as shown in the following. Gur- 
buz and Beck [23] claimed that the dynamics of successive 
sunspot maxima is low-dimensional with features similar 
to the intermittent logistic map. Barnes [23] proposed 
an ARMA(2,2) model mapped by a nonlinear function 
to reproduce the solar cycle with its statistical features 
and its irregular and unpredictable behavior. In conflict 
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with this result, a randomly driven nonlinear oscillator 
was proposed for the first time by Palus and Novotna [3] 
by using the mutual dependence of the instantaneous am- 
plitude and frequency of sunspot series as discriminator in 
a surrogate data test, to reject the null hypothesis of an 
underlying Barnes model [24j . The mapping to a scalar 
time series from the spatio-temporal magnetic field de- 
scribed by a nonlinear, eventually stochastically, driven 
partial differential equation for the magnctohydrodynamic 
dynamo, was not excluded from the results (4j[5] . Sunspot 
irregularities were attributed to the stochastic fluctuation 
in one of the parameters of a Van Der Pol nonlinear oscil- 
lator describing the irregular periodic magnetic field |25j . 
while recently, an overembedding approach introduced for 
modeling and prediction of nonstationary systems was suc- 
cessfully applied to the series with high precision |26j . 




Figure 3: Rank order test for the sunspots index, averaged on 
11-years, reconstructed up to the past 11,400 years against 1000 
Barnes null models (a), and for the monthly smoothed sunpost 
index from 1749 to 2008 and 100 surrogates (b). The arrow 
indicates the value of ARK for the original time series, dots 
indicate the rank ordered values of ARK for their surrogates. 

We considered the series of I = 3123 samples for the 
monthly smoothed sunspots index, released by the Solar 
Influences Data analysis Center [27] from 1749 to 2008, 
and the historical series of I = 1113 samples for the 
sunspots index, averaged over 11-years, reconstructed up 
to the past 11,400 years (MSI) through indirect methods 
28,29]. Standard estimators as the Lyapunov exponent, 
the correlation dimension, and the increase of the predic- 
tion error with the prediction horizon can lead to spurious 
results when applied to short time series, as shown for 
instance in Rcf. |25j and references therein. 

Here, we have applied ARK both to the I = 1113 and 
to the I = 3123 series, performing surrogate tests against 
different null models. If a null hypothesis can not be re- 
jected, the sunspots index time series is not distinguish- 



able from the null model: thus either it is well described 
by the model or the used statistics has not enough dis- 
crimination power. We started by considering the short 
time series and Barnes time series as null models. Fig. 
[2k shows the rank ordered ARK values for the historical 
series and 1000 of its nulls: we reject the null hypothe- 
sis of an underlying Barnes process with 99.8% confidence 
level. Successively, the long time series was tested against 
I A AFT surrogates. Fig. [3p shows the rank ordered ARK 
values for the monthly series and 100 of its surrogates: 
we reject the null hypothesis of an underlying stationary 
linear stochastic process with 98% confidence level. 

We verified that one of the best test statistics up to date 
[9], namely the nonlinear prediction error, gives similar re- 
sults. Indeed, we emphasize that NPE makes use of three 
parameters, the lag time is, the embedding dimension M 
and the size of neighbourhoods, while ARK does not need 
any phase space reconstruction, and only uses one param- 
eter. In addition to this, our numerical experiments on 
simulated time series show that, excluding the time for 
the estimation of needed parameters, ARK complexity is 
generally 0(7), while NPE complexity is 0(l 2 ) [30], where 
/ is the time series length. Fig. 0] shows the CPU time 
in seconds versus the length of the series, for both NPE 
and ARK applied to Lorenz models of increasing number 
of samples: complexity is 0(; 1 - 96 ± 02 ) fo r the former and 
0(7°- 96±0 - 03 ) for the latter. 
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Figure 4: CPU time in seconds versus the length of the series, 
for both NPE and ARK applied to Lorenz models of increasing 
number of samples. 

In this Letter, we have introduced ARK analysis, a new 
procedure to detect nonlinearity and nonstationarity in a 
time series. We have tested the discrimination power of 
ARK on a wide variety of short and noisy time series, ei- 
ther from models and from real data sets. A series of the 
far infrared laser emission in a chaotic state and a sam- 
ple of the intracranial EEG recordings during epileptic 
seizure, have been analyzed: in both case our procedure 
was able to detect, with high confidence level, the presence 
of nonlinearity Using Barnes models, supposed to repli- 
cate the behavior of the annual sunspots index, we have 
obtained a statistically significant rejection of the null hy- 
pothesis of an underlying stationary linear stochastic pro- 
cess, possibly mapped by a nonlinear function, for the his- 
torical time series. Using surrogates, we have obtained a 
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statistically significant rejection of the null hypothesis of 
an underlying stationary linear stochastic process, for the 
monthly smoothed sunspot index time series. Although 
no particular model for sunspots index has been proposed 
here, the presented results are an important step for the 
comprehension of the underlying solar cycle mechanisms. 
We believe that our method can be successfully applied to 
other interesting real-world time series whose underlying 
dynamics is still debated. 
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